DOCUMENT RESUME 



ED 243 949 

AUfffOR 
TITLE 



TM 840 264 

Cziko, Gary A. ; Lin, Nien-Hsuan Jennifer 
Th e Construction and Analysis of Short Scales of 



PUB DATE 
NOTE 



PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Language Proficiency: Classical Psychometric, Latent 
Trait , and Nonparametric Approaches. ^ 
; ?Apr 84' : . o v . • ' ' . ■ _. ^ _• ; 

37p.; Paper presented at the Xnnual Meeting of 'the 
American Educational Research Association (68th, New 
Orleans, LA, April 23-27, 1984). 
Speeches/Conference Papers (150) Reports - 
Research/Technical (143) • 

MF01/PC02 Plus Postage. _ ■ 1 \ 

Adults; *English (Second Language); Foreign Students) 
Higher Education ; ItSn Analysis; * Language 
Proficiency; Latent Trait Theory ; Measurement 
f echniques; Nonparametric Statistics; ^Rating Scales; 
*Test Construction; fe^t Theory _ __ _ 

Illinois English Placement Test; Test of* English as a 
Foreign Language 



ABSTRACT . - _ <* ^ 

- This study used classical psychometric, latent trait/ 

and nonparametric "approaches to analyze 13- and 14^-ite^ scales of 
JEnglish language proficiency. Tests of English listening ^ 
comprehension (dictation) ^nd reading ( "copy test" ) were constructed 
by modifying the, standard Bictatioh testing procedure to create items 
of text segments which varied ^ideijf in both length difficulty; 
Both the dictation and; copytest were found to be homogeneous , > : ^ 
cumulative scales of language ^prof iciency with high reliability and 
validity. Log ability scores provided by Rasch analyses were found to 
correlate better with ot\er measures* of language prof iciency than, did 
the dictation and cotpytest raw scores. These findings indicate that 
the two language testing tec investigated provide a useful 

innovative approach to measuring general aspects of language 
^proficiency. The theoretical and practical advantages of this * 
approach oyer other language prof iciency measurement techniques are 
discussed as well as implications for measuring language pro 
and other cognitive variables; (Author) 



**************************** 

* Reproductions supplied by EDRS are the best that can be made * 



. 7 from the original document. 

************************************************************* 



* 



***JH 



ERIC 



- : ^jjMlK& 



Scales of Language 




;^-/-ii 5 The /Construction and Analysis of Short Scales of 



.3^.; Language Proficiency: Classical Psychometric, 
li Latent Trait, anci Nonparam.etric Approaches 



•§^y A . Cziko and hJien-Hsuan Jennifer. Lin 

UVii^ersity of Illinois at Uf banS=C^ampaign 

/- • 




v. 



Uit^DEPARTMENT OF EDUCATION 
NATIONAL INSTITUTE OF EDUCATION 
EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) ' 
* ? t^^^li 35 . M n ^produced as 
' . r ????!y?!^ l, ..f r 5? m * ne person <jr organization 
originating it, ' / 
^ Mjnor changes have been made Jo improve 
reproduction quality. 

Points of view or opinions stated in this docu- 
ment.do not necessarily represent official KlIE 
position or policy. - 



pi 

i 



•THRMlSSIOtii TQBEPRQDUCE THJS 
MATERIAL HAS B&N GRANTED BY 



TO_THr EDUCATIONAL RESOURCES * 
^INFORMATION CENTER (ERIQ)." 



Paper presented- at the annual meeting of the American Educational Re- 



search Association, New Orleans, April 1984. 



ERLC 



2> 



tv; . "■ ■:■-. ■■-■ :■/'■' x . Scales of Language 

- Abstract . ■ v • 



ERIC 



classical psychometric, latent -trait, and nonparametric 
-appj^ 14-item scales of English language profi- 

c5ehc^^^^^t^ :offeEngjish jjstening comprehension (dictation) and reading 
; . ( '^6p^ by' modifying the standard dictation testing 

^procedure to createVit^^^ of. text . segments which varied widely in both 
length and difficulty, -^^h the dictation and copytest were found to be 
homogeneous, c u m Li f-a t iy e sea 1 jot -language proficiency with high reliability 
and validity. Log ability scores'* provided by ^asch analyses were found to 
correlate better with other measures of I a n gu afg e p r o f f c i e n c y ? than did the 
dictation and copytest raw scores. These findings ^Inj^^ate-^that the two 
language testing techniques investigated provide a useful inno^.tfve . ap- 
proach to measuring general aspects pf language; profici'ertpy, T^;;theo- . 




f and practical advantages of this approach over other I^Sp^g^ ^„ 

proficiency measurement techniques are discussed as well as implicatt^ijs?^^- 

---- ' : \ e - ' A K 1 ' ■ - - . ■- ' :. • - v • '. ' "• •V^;^> <? ? 

for me^suring^la^guage proficiency and v other cognitive variabtes.-. .: , 



v> 



Scales of Language 



The ebristrtfctidh arid A ri al y § is . Sfipf t Scales of tanguage*; 
Proficiency: Classical Psychometric, Latent, Trait* and 



Npnparametric Approaches 

Beth the theory' and pPletice of assessing second language proficiency; 
ha°s undergone marked changes over the last 40 y<*ars. Spolsky (1978} has 
!/ classified language testing theory arid ^practices / into three major trends or 
periods to characterize the major changes which have taken place in the 
■ field. The first "prescientific" period reliecJ/ primarily on teachers 1 $ub- 
r jective assessments of their- students 1 ability to speak and/or write the 
language. During this period, generally before the 1960s, in the U.S. , 
... there was little concern with ^the statistical .reliability or^ validity of Jan- 
giiiage assessment but rather an assumption that anyone proficient enoOgh 
- rEq teach a language would be' also qualified to assess students 1 proficiency 
. ' - in iti The publication of Lado's Language Testing in f96J marked the / 

' . • beginning of a second era in language testing/ the "psychometric-struc--; 

twralrst" period, which was primg^ily concerned with (a) constructing tests > 
. ; „ which v tested knowledge )5T^^screte^jr^uistic structures and rules, and- 

' . (b) doing so with demonstrable ^statistical reliability and validity. Mb^ • 

: / ■ I recently; however/ there has been a reaction against this approach result^ 

/ . ." ing in what N Spolsky has termed the 'Mhtegrative-sociolinguistic 11 approach ^ 
/ .to language testing which, while not discounting the importance of psycho- V 

metric, reliability and validity, puts a major emphasis on testing language^! 
% 4 as a functional, communicative^ tool as used . in genuine communicative V 

' -v;.. ; ' \. . ' : . * " <{ •: 

^ ^ t settings. " 5 ' ^ • .v 

• '**'j''t« It is bf particular interest to consider the dictation procedure- as a »■"" •: 

* ; :- -0:, .'-V- A . : " " : - ' ,. i. : 

language testing method from the perspective of these three different 

•approaches to language tes-tihg. . While, the p fact ice _.gf having students 

ERIC 



listen to. and write down second-cj^hgd^g^ passages appears lib- fray^ ;$een>, .^$£> v 
fairly -vvidely/iiseri as b6th a teacirn^''ahd^ //' 



1970s, it^was generally later ignored by the trtie: "psychom^ 

ists.-?-. kado (1961 ; p. 34) describes dictation as a 

' • • ••' \ 1 • •' : ' • ■ , ■•• • • • < • . '^yy*y ■• •' ' - 

guage proficiency sinee both ^ ; d 

examiner and, since the context of the passag^may ^^fiiie r^cogniti^h of 
words Which might not. be recognized in isolation. ;pecen|^ trie p 

ihtegrative^sbciblinguistrc approach to ! a n g u a g e • te s t i h g . h a s rew^d : tfie usj(\^ 'yf r t 
of dictation which is now seen by mihy as a* conveni^fit a ra^iva i I Q la u^ge 
testing procedure, whibh provides ^a useful mea%^^ of general language ' y 
proficiency for those students who are* familiar with the written form of the # v/ 



language. ■■ j . ■ ' v ^ ' y : fy ••>•• 'v 

"Much of the impetus ' for l the\evival of dictat|d^as^a language testing 
procedure has come from the work of John Oiler and4 his associates who 
have argued 'convincingly on both theoretical ^and empirical grounds for the 

. ■- . I ] ----"'V •■- : > :. 

"convenience and: validity of the dictation test (Oiler, 1973; 1*979; Oiler & 
Streiff, i975)7\Oiler (1979, pp. 16-33) conceptualizes language proficiency 
as a M pragmQ^fc expectancy grammer 1 ' , i.e., ,a system of -knowledge and 
rules which allow one" to predict the form of language as it is being heard 
or read which, permits comprehension as a constructive ;(or* active) cogni- 
^tj&g^ process . . tsfee Neisser, 1967,. Clark & Clark, 1977, and : van Dijk S 
Kintsch, 1 983 , for detailed theoretical considerations of language cbrnpre^ . 
heiision a$ . a constructive, predictive process). Jhh view of language 
proficiency is supported by a number of emmrical studiesv (see Clark & - 
Clark t 1977, pp. 210-215) which have, demor^trated; that 7 (aj speech per- 
ception is an active process; which requires the knowledge and use of 
top-down contextual * constraints, arid (b) the accuracy of recall of audi- 



Scales of ^La^uage : 



tbrily presented sentences 'similarly . : deperids bri knowledge of *the lexical V 
^s yritafct ic, and semantic' .systfim^ of' th£ lang tra gp*, vThji^ nrr e of the very 



lictation (i.e. , it -provides 



" reasons for which Lado criticized ^dictation (i.e. ., it provides contekt which 
makes it easier to identify individual ivb'rdsj may he considered now to be ' 
dictation's* most important characteristic as a language testing procedure 
sinee 0 it is- sensitive to one's Integrative knowledge of the phohological , 
syntactic, and semanti^ systems of the language which permits its antici- 
pation .enabling both comprehension arid production. ; l 

There also appear to be a number of practical reasons for the renewed 
popularity of the dictation procedure as a ^treasure of second language 
proficiency . A dictation test is relatively easy to construct/' requiring 
only' the location of a passage of appropriate difficulty arid style for the 
students to be tested and its divfsibrrVinto segments (of Usually 7 to 
12 words) for presentation. It is ^therefore considerably easier to con-^ 
struct than multiple-choice tests (see OI|er, 1979, Chap, 9) and very 
adaptable to the heeds of individual classes. Thlfc^t is possible to'. create 
a dictation test with relative ease tising an expository or narrative text or 
dialogue in either a formal or informal .speech register at an appropriate ; 
level of difficulty ( in terms, of" syntactic structure and vocabulary) includ- 
ing appropriate content, this adaptability of the dictation procedure gives^ 
it a number of advantages over available standardized language tests, 
particularly where formative evaluatibfxs^of students/ progress are desfred , 
and where' specialized language skills are emphasized ( e . g . the abil i ty to 
read; arid write scientific articles in a specific technical 



9 

ERLC 



' \ Nevertheless, there • are a number of factors which limit the* usefulness 
of the dictation procedure. Among these are: 

_ 

6- . 



Scajes- of Language. 

^ .- ■. • 6 ■ 



ERLC 



1. - In choosing a text for a dictation pa^sage^ there is hb; simple 
formula for deciding how difficult the text should he.^ .this of part i cu- 



lar concern when, a ------- ■ - ■ - •■ • • ■ - x 



a tor o e c fo* 

* T r C j . 

gro^ip of students, representing a wide rarige of second — 
language proficiency is to be tested. ■' o : 
* - 2 . While the usual procedure for scoring dictation involves sabtract-t 
; . • irig one point for each insertion, .deletion* permutation, and ^substitution at 
. the word level, \here is no clear theoretical or empirical basis for this 
particular weighting' of all types of errors/ [ Also, = tbtar dictation scores 

: ; ■ \^Vp^ 4 ■ \? ■'' N 

; - which a r^ equal may represent quite different paTEerns of responses, total 
test scores may not be. easily comparable among examinees. For example, a" 
score of 70 on a dictation test of 100 words may indicate quite different 
levels of language proficiency depending on whether missed points are 
primarily^ due . to (a) omitted or inserted content words '(e.g., nouns, 

verbs) which seriously affect the comprehensibil.ity : of the passage (arid 

• .• • _ .. _ ■ - ■ ■ • ■ ; a '"** - ' ' 

, therefore would seem to indicate poor, comprehension erf the passage by the • 

e'xaminee) qr (b) omitted or inserted functors (e.g. , articles, conjunc- 
tions, prepositions) which are less important to the meaning of the text. 
. 3v;' Although a dictation , test is relativel^easy to construct and 

administer, .it requires 'considerably more time and care to score than most 
. ■ other tests . Requiring written responses (e.g. , multiple-choice or cloze 
tests) if each individual word js to be scored . - 

. 4. The dictation procedure is limited to measuring listening compre- 
hension and therefore cannot be used to assess language proficiency, via 
the modality of reading* * : • ' ' '* * 

Cziko (1982) felt that many of these, shortcomings of the dictation 
procedure for measuring second language proficiency could be eliminated 
by making some basic changes to 'the way in iwhich dictation is, normally 



cales of Language 



administered and scored- and by developing ^an analogous testing prbcedUrg - 
which involved reading instead of. listening as Used in the dictation proce- 



dure. *^The principal changes to the dictation procedure involved* present- 
ing segments of the test text 'at widely, varying lengths, from 2 to 21 



words, and scoring e^ch .segment as« a single item ( right ,or wrong) instead 
of scoring' each individual word. Cziko's major findings (as they relate to 
the four limitations of the traditional dictation procedure described above) 
were:* > ' ; . ; " * 

1; Varying the length of segments was effective in manipulating 



their difficulty, resulting, in a dictation test with a wide range of item 
difficulties appropriate for testing^student^ possessing a|-wide range of 
language proficiency. ' n , 

\ * . ' :r .... . , - - pel ; 

2. Awarding one point for each; correct segment resirltM^g- in scores , 
based on relatively fair items with surprisingly high reliability and valid- 
ity. Iri addition/ the procedure resulting in a Guttmah scale of high 
reproducibility* and scalability so that any given total score presented with 
few exceptions the same^attern y of responses to ;ea.Gh . iqdi v[dpa) jteoi (seg- f 
ment) . ' 7 ; ' .' . \ >V : \ ... i 

1 3. Scoring "by segment was found to be three to four 'times Vaster 
than the conventional word by word scoring procedure. 

v 4. The analogous test, involving reading and ?ji/vriting (called a 

"copytest 1 ^ administered to a smaller 'gr&up of students- cjfd not ffiave' "corh- 
parably high reproducibility or scalability. No analysis of Its reliaBRjty or 

validity was undertaken. » ^ ; \. 



_^_;^Slace^^ 

. ..... : " '-: _ - tboufrl! •: : 

investigate whether this modification of the dictation procedure^setrtrH result 

jh : a Uriidirriensional , cumulative scale of language proficiency using GattmaBiV^- 



Scales of Language 



' scaipgram - analysis^ it should be mentioned here that Mokken (1971;) foaS'^- 



noted a . number of problems associated 



tHef use, of the indices of 



scalability most often used to evaluate Guttmah stales * and , has ^erridri- 

- ' ^ £ 'V / ' . ^lO'^^^'D^-.- " \ : \ •. 

,st rated that the ^ index of test homogeneity |H9 proposed^ by Loeylrtger 

(i 947, 1 948) serves a| a clearly : better > criterion of scalability. Also, 

Mokken (1 97T> and Mokken afid Lewis (1982) have described a new index 

( H ; ) "wftioH is useful iri evaluating the homogeneity and scalability of iridi- 

vidual items within^a given scale of items. 'V-.''"' « : .. V 

The; purpose of the present study was , to replicated -the findings, v of 



Czikp (1.9811 that 
above provide a 



the modifications to the standard dictatioh test, described 
practical and convenient procedure for obtaining reliable, v 
and^r ValiS short I scales^ of language proficiency involving listening and 
reading: Unlike the previous study, however> three ^ifferen1^ ! approach?s: 
lyz^ the ^resulting scales of J a h g uagejp r o £ici en cy . -These 
included (a) classical psychomejric procedure^ foi^ item 



Were used to ane 

■' ■ * ' ' • 

three approaphef 



analysis and reliability estimation, (bj a- one-parameter latent* trait ( Rasch) 
modek, ^af^d (c);/ a honparametrjc ; staling approach 'similar io' f&uitman's^ 
(1947, 195^0) .and Loevjnger's (1947/ 1948) concepts" of a ^nidimensional, • 
cUmqlative r and . homogeneous scale t yvhich has been fjurtntfr rjfifjhed^ *by , 



Mol^eri;, and Mbllken ^nd Lpwis (1982). H" i; 



^Subjects 



• 3 * 



.A J t^taf ^ ofS;6^ stud^t? ^presenting ^fpd r s I ev^ls i ff^ 

- Er^^ The begirining group (GrbLip BEG; ; , r i3- ; 

.^&i^^s)^ a INT i rt2 students) wire.i 

adults study iisig at^ihe . Ihtehsive Eri g I j sb 5 *i tu te ^ - - 



,^ -Scored the ^lovyest level on the i iFii?ois lEng I ish placement Test (IEPT) of 
alh I El students while Group I NT had scored at a Hiq 44er- # int e rm e diat e 
•,, l£vel . Neither Group. BE:G|hor Grbup/INT was enrblled iri regular ; ;umver~ 
sity courses. Tfie advanced group {Group ADV] were 25 foreign students 
enrol led in the University of Illinois who had scored high "enough on the 

/ -I' ■-' ."."A . ~-V - >- -'.V ..■.V,. 1 .. _:' : -V 

~ Test of English as a Foreign Language (TOEFL) .to be admitted to their / 
■ desired program of study but. not high enough to be exempt from courses 
In English as a second language (ESLj . A group of native English speak- 
ers (Group NS, 17. subjects) was selected « from A m e r ica n U n d e rgr ad u a tef - : 
enrolled in an English rhetoric course. \ Thus; these four groups ; of sub- 
jects represented an extremely broad range of Engl ish proficiency ^varying • 
from extremely limited (Group BEG) to educated native-speaker competence" 
rCrpup,N5). ■, , , ! ..: ; . ;,-;/. ' \ A/7 ;: .. : - ■> 7;g }>,.■ '777;777;7C7' 
Materials • • ,• -o • •• v y v ^• : ' : ^;.."-v' ;v bf "V* '•• A\S''-'A 

' . r The texi used for both the • 

"•' * ' '" ' ■■• ■ *■ •• % ..' • • - jf' •• - v r . ; ; • V ; ; •- 

version of the introductory paragraph t^tW 

pea ring Wildlife 11 taken from a reader for intermediate and advanced, ESL _ 
students. (Lugton, 1978,; p. 221 ^fter pilot ^testing the passage as a 

: ; .< : . * ; "iv ' . • ' ' . ■ ' ' . • • ' • • • \ 

■' ' ■'■ '' ' .' . '* ".v '^3 •;: • • ' •_ __' : ••>.•'. _ _ _^ 

dictation test with a smallv -number of students comparable to fstudents ;of 
u-.J-\ Group INT, it wai revised ,s.oj:hat words which appeared to be^ too difficult 

we/re either omitted or changed to more, common English synonyms.'. •• -:V-« 

• _ , Since, y/e wanted to create a set ^of items representing a wide range of . 
: > di spdg h t ; to ' m ah i p u I ate ffik di fficulty . of • the . .^ 

^^.^ ^ eg m^ri ^ P ^ ^ gfe ' w f cfe were v t O^Jb Li s e d as test items. While it wa$ 
^^cbghi^gi* thia^t jength> vci^ / 



" ■,■ ■■' . "="•= . Scales 6f fcahcjuage 

h ended and recalled > % it was apparent that manipulatinc^ |)f the 

segments- was by far the most convenient way of providing 'a set of items 

''■:&it£$: : :-.- ' . ' ; *" : "- --\ '• ■ • V ■ ■- • 

w'ttFj' Widely ranging difficulty levels. Thus., thi test Ve^ibri: bf the passage; 

[ - consisted of 105 words divided into 1 3 - segments of geher all^ increasin 

length -ranging from 2 words (first .segment) ■ to ' f9>;Vsroi^S;:/ ( last . segmerit " 
see Appendix) . These 1 3 segments were formed by dividing 1 the text at 
the natural division points provided by phrase, -clause, br sentence bound- 
aries. :■./■' " -. . ' J . i-" .. \ .' '-v.: . " ; 
. " . ' .' ' ■' ; v. " % '. ' :." ''-"'.v. • v : ;' ;• .-; ■:■•>. 

Procedure* ^ ■ V. : ,.:; ■. :. ■• *' 

• The administration of each test involved three complete* presentations 
of the test passage^ either auditorily via an audiotape recording j for the 
dictation) or visually /Via i^ped transparencies on ,an pverhead projector 



(for the copytest) . l^leV entire testing session las.ted approximately 1 5 
minutes for each test and included; , (a) test instructions, . (b) the first 
presentation of the test passage during which the entire- passage was 
presented, to the subjects without interruption, (c) t6e .second presentation 
of the test passage divided into segments which ^included pauses at the end 
of each of the 13 segments to allow the Students time to write, what they 
had heard: or read, (d) the third presentation of* the test passage with 
pausesiat the end M each bf .the seven sentences of the passage to allow 
the students to check and correct what they had written, and = (,e). a . 60— 
second ;p^use after the^t|iird presentation for final corrections. ' 

For ali dictation presentatiohs of the test passage, the passage, was. 
read at a ; speed considered normal for a careful oral reading of /a written • 
text; For all copytest presentations, the time taken to read the ;;'^xt": % fSi^v : 
Vthe dfetatibln test (of" pbrvtlons of the tek! fbr the second;; an^^ 
sentatibns) was used as the yisuaj a preseritati6h 1?i m e • f p t . t h ^:t ex t a nB^ 



ERIC 11 



ERLC 



r x portions of the text. The len^tlvof pauses used f< 



: presentations were det^rryiir>ed by estimating the 
students tp write and correct their work. For tl 

- the length of each pause in secdhds .after each segi 

' • • ■■?<K\4^t ■ ■ ; '- ?: -' ■ "" ' ' ' ■'■ ' ■' • ' . ' 

'fliyidjng;- the number of letters in the; Segment being 

; ' t h ird- ■ preseh ta ti on, t h e length of eac h . pause i n sec< 

^d i v i d i n g ^t ft ^ h u m b e r of letters in the sentehce being 

v Since :.fhe same passage was used for both . the 

and since the same students took both tests, the dr 

. the tests was counterbalanced with approximately h; 

taking the dictation test first and the remaining hal 

• .. .... •. • • . ' I . } • ■ .„'lr 

...test, first. For all s t u d e n t s ; the r e - was" a n ; i h t e r y & I 
• the administration! of the two tests. t Both tests were 

regular ESL class period to -each of the four groups 

_ ■ ■ , ... . » ■ ■ ■ 

Scoring 

For both the dictation and cdpytest; ; each of 
considered one item. Students were given one p 
written without error ( including spell ihg ^^ors) :w 
score : for each * test .-. WhT|js- a dumber of re$eaircher 
: - : less strict ..scoring procedure which - alio ws for son 

'_' " _ ' ' ' ' _•_ _1 _' ' 1 L _ . 1 _ ' 1 . . ■ J 

gives at least partial ered^ v to" : res^bnse-s ^vhich tefai 
the test passage (see Oiler; 1 979; Sayighon ; 1982) 
procedure is relatively time consuming and likeiy ; ,tci 
' it • requires a subjective jud^meht oh the ; p^^ 

errors should be given credit;* A I so , C zjko (1 982 ) 
r v^s^mertt,:.s^^ : us ed f o r a d iota t ion t#§ d i 

with high reliability" iand validity and that the d 



■V ? 



V 



4 



? 



ERIC 



scored^ Sri this way formed a Guttmarr scale of higti reproducibijity and 
scalability ,j • " : . , * ; ..■ VV'- ' 

' * ' ■•■ ' ■ ~ s '■■■: V Results'-' ' - " ' .. : 



■ '~ y; There was a -small but quite consistent order effect for the two tests. 
This was indicated by the finding that regardless of group, those' students - 
who took the dictation after having •taken the copytest did better on the 
dictation than those students who took the dictation jlrst. These differ- 
ences in group nieans (out of- a maximum possible score of 13) were .98, 
1 . 17; 1 .35, and .12 for Groups BEG, INT, ADV, ( and NS, respectively. 
The same was also generally true of v the copytest with the exception of 

. •". \ • ■ _. _■' ,. . . . . * . ••- • _ . ' i • . 

Group ADV for whom, the order of administration had virtually no effect. 
Order differences iri group means for the copytest were .42, .67, -.04, 
and .70 for V G roups Bj^3^ INT, ADV,; and NS, respectively. However, 
none of the above differences was statistically significant (£ > .05) When 
tested using the t statistic and the directional alternative hypothesis that 
group means would be higher for those students who took a^iveh test 
second. Therefore, all further analysis were done without regard to order 
of administration. : . * ; \: 

Item and Scale Analyses . i- 4 

Three- different approaches were used to •■analyze the item and scale 
characteristics of the' dictation and copytest (see Table 1). Since Rasch 
analysis requires the-exclusioh. of students receiving zero or perfect scores 
bri a test, these extreme subjects were excluded from v the analyses of all 
three approaches so that all results would be based on the same students 
for each of the two tests. Thus, data from 47 and, 56 students were 



included in the following analyses; of the dictation and copy test, respec- 
tively.' •. * 

First;. Standard psychometric indices were computed for each item 
using the reliability procedure of SPSSv(HuFl & Nie, 1981). The indices 
included iteni -easiness (£> i .e. / the proportion of students passing each 

•>i.|i^tjle value ctf i ; g[v^ ife^ 

These analyses 'indicated that both tests included items of widely ranging 
easiness (.09 < £ < .83 for the dictation, .11 < £ < .98 for the copytest) 
with the first three items of each test Having noticeably lower values of 
than the remaining items of each test. It was also found that while 
the values of "a if deleted 11 were highest for these first three items of 
'each test, variation in these values across items was very small. 



insert Table 1 about here . 

In the \second approach to analyzing these scales, the Rasch model (a 
one-parameter logistic latent trail model) was used - to fit the data gener- 
ated by each of the two tests* using Wright and Mead's (1977) BICAL 
computer program. A one-parameter item response model was considered 
appropriate since guessing was not a factor influencing performance on the 
tests (all responses were supplied by the students, not selected), the 
number of subjects was relatively small, and with similar 

scales v (Cziko, 1981) revealed that they resembled Guttman scales with each 
item having similar high discriminatory power. Dividing the examinees into 
two groups. (24 low scorers and 23 high for the. dictation; 30 low arid 26 
high for the copytest), total fit and discrimination indices were' computed 



#. ■■ ' ' Scales of Language 

-S • " ■ w . . . . ■ 14 • 

for each item (see Table t) : Except for the third cbpytest item, alWtems 
provided quite acceptably :totaf fit Indices which were well within three 
standard error units of the expected total fit values of unity (standard 
errors of expected total fit were .21 arid .19 for the dictation and copy- 
■■■ test/ respectively) j-:- The; Rasch ahalyges also ; indicated that with five 
^^§^p^o^s^pii'i \df/ iViofe i': :©f }| : 6 U te ms )'• >'t h£-* - d iscnriijha trag ppwefiypf;- \t$$$^i$i} 
y both the dictatipn and cbpytest were., comparable. With a value of unity 
indicating that an item's bbserved: Hharartenstic curve Is- equal in-: ;St6£p- 
ness to the best fitting iogistic curve for all items, the first and HthiVd 
items of the dictation as well^as the third copy test item were found to have 
relatively flat curves while the fifth item of the dictation and the sixth and 
eighth items of the cbpytest were found to have relatively steeper curves 
an*d consequently higher discriminating power than the other items of their 
respective scales. 

Finally; the _H- statistic formulated by Mokken (1971) was calculated 
for each item using Czikb's (1 984 j computer program. ' This statistic is 
similar to Loevinger's _H (Loeyinger, 1947, 1948)- in that it provides an 
indication of scale homogeneity and scalability. However , whereas* Loey- 
inger's _H can only be used to evaluate the homogeneity or scalability of a 
complete set of items, Mokken's FL provides a* way of evaluating each item's 
contribution to the homogeneity or scalability of the scale of which it is a 
part. Using the criteria proposed by Mokken tp. 185) of considering . 
values of .5 or above as evidence of strong scalability, .4 to .5 as evi- 
dence of a medium scalability, and .3 to .4 indicating weak scalability, we. 
nqtice 19 "strong" items, 1 "medium" item, 3 "weak" items and 2 nonscale 
items with -R. of jess than .3; Again, all weak or jnonscale items were 
found among the first three items of each test. 



Scales of Language 
1 5 



In c|pjpanrig tHe above three approaches to scale arid item, analysis, 
all three showed a high degree of convergence in signalling items 1,2, 
and 3 of the dictation and items 1 and 3 of the copytest as items with a 
relatively poor fit to the scale defined by the ot#1er items. However, while 
the item-total CdrrejjJttolf arid Scalability for item 2 of the dictation were 
relatively loy, Jr-^ = . .38/ H, = .341; this item nevertheless had close to 
expected , fit A ancf discrimination mdfces accordfng tQ the Rasch analysis. 
Also, whi J^^enl" 5; cff the dicta tip ri a^d S, of the ./copytest showed a 

m jj c h st e e p e ^discrimination curve, than other, items iri tiieir respective 
scales, all other indices of fit for these two items appeared quite accept- 
able. • • i 

J Indices of the reliability and homogeneity of the dictation arid copy- 
test are given in Table 2 . In spite of the fact that each test consisted of 
drily 13 items arid that students with. extreme scores were excluded from 
these analyses, all estimates of psychometric reliablity were in the range of 
.82 to .90. In additon, the dictation and copytest were found to have U_ 
values of .50 and .58, respectively, indicating that they comprised what 
could be considered strong homogeneous scales (Mokken S Lewis, 1 982*, 
p. 422). 



Insert Table 2 about here 



Two principal technique^ were employed to assess the construct valid- 
ity of the two language proficiency measures. These included (a) compar- 
ing the mean dictation arm copytest scores of the four groups of students, 
and (b) examining the ^rrelation^ of the dictation and copytest scores 
with ^bther tests of Eriglish^eadirig arid listening' cbmpreherisibri. 



. » , Scales of Language 

,, ' " : ' ' t .' : fK * ■ 

;•; A summary of the>> performance of the four yirbups on the dictation 
jimd copytest is given in Table. 3;* For both measures, the relative magni- 
Mtudes of ail group means were as predicted with Group BEG scoring loVv^ 

.1; - \ . ■ v :. * r- . _ _ j . 

jjest, followed in order of increasing mean scores by Croups I NT , ADV , T arid 



!KNS. ' Differences in means between adjacent groups were shown to be quite 

%'.iL'-.\m 1 - ...„.-."..■, .. 

large when divided by the pooled standard deviation of test scores for all 
four groups. The resulting •'effect size (ES) was well above unity, for each 

Comparison with the largest values obtained when comparing Groups ADV 

::':■>'. • \ , . . . " •'. ; - ■ ; • " . . 

.,and NS. Confidence intervals of the difference between adjacent group 

means (C = .95) ranged from a lower limit of .77 (Groups INT and ADV on 

the copytest) to 8.01 (Groups ADV and ] NS on the dictation). These 

analyses provide evidence of the Validity of the dictation and copytest in 



that* the ordering of the group means was consistent with the ordering that 
a valid test of English proficiency would be expected to produce and 
differences between adjacent group means were large and statistically . 
significant. In addition, all but one student of Group NS scored ID or 
above on each test whereas the majority of students in Groups BEG and 
I NT' scored 5 or below on each* test. ' - 



Insert Table '3 abob't here 



Pearson product-moment intercorrelations were competed among the 
dictation and copytest total scores, the log . ability scores ,gf students 'with 



nonextreme dictation and copytest scores, the subparts and total o^ tl^e 

. : • • \ 

IEPT (dictation, structure, and cloze tests) , and the subparts and total of 

the TOEFL (listening comprehension, structure, arid reading comprehension 

tests). The upper triangle of Table 4 gives correlation coefficients^ using 



■ t , - • '«>• * , e : ' ' Scales' of Language ■ 

. - ; s >; : .;' •■>•■:.•• ~ ' . 17 ■ - : 

■ * -•• . v *■*'<» - ■ . 

alt available date. ; Since' students from GrbUps ADV or N.S had receri|:ly 
taken the lEfPT pr;^ involving these tests were B'ased. 

on relatively small numbers bf studfents (18 : -to*-25) • Also, since the cor re- 
latibn coefficients in the 'Upper triangle of /T^ble'fj are based on dlffe'oirtt 
numbers and subgroups pf. ^students ^ coefficients* based oh data 

from fchfr same/iei 0^:15; stud en-is from *Grpups BEG and i NT who took all 
tests listed in the table were also com $ u ted and a re p resen ted in the lower 
triangle. '• v : ' •'■^V . . " :.. V : v '"^ '•'•f-^^U'^ ^jVD^t 

Among these Cjprr.el'atiorvs , .of patrticular interest % 1$ that both the 
dictation d raw scores' and; diction log ability .scores* : hacl mbdera tel y high 
correlations with the* TOEFL listening ^comprehension,, test ( .84 \and .82 on 
the upper and :7U and .82 on b th'e Idvver' triangle, respecUvSlyj . Similarly; 

\ y ; --^ - ' '">: h : dr v . , " ; ; " ^- :; * v# .• ^ 

the cppytest \raw scares and .the copy test log Ability scores correlated quite 
well with the t TOEFL reading compreheh^ioh test ; 4-68 and^.67Vph the upper 
and ;'65 and* .70 on the loCver triangiev respectively)'. /£|so, while', the 
dictation and. -copy test used completely different methods for tteesenting the 

I i - ■ •• . '■ -,: " ■' "-' " •; " ;.. .'• - 

test .text v (auditory ' Vs. vi sual ) co r rela tibrisv between-* b oth raw and log 
ablfty scores -were quite high when based .qn all ayai!M3le data ( ,89 and .79 
for raw- scores and log ability sco?esI respiectively) .Fina ; l l.y , oh the" lower; 
triangle (where the intercor relations can be mo re mea hipg fu 1 1 y com pa red 

oii 



since they are based or\ the ' s^fn^e grbtip of students] / t.he r log. abiJfty 
scores of the dictation ahd copytest had' # with o^fily ^e exception out of . 20 



comparisons, " uniformly higher correlations with' the - remaining language 
tests , than did 'the simple total (raw) scores of the dictation and fed 
While these differences were' hot g reat- ( rang in^g from .03 to .08) / tHey do 
-suggest that the noti-linear transformations of total dictation and copytest' 
stores proviBed by the . Rasch analysis were better predictbrs of perfor- 



Scales .of Language 



' fnarice on other irieasti res of larigauge prbfieieriey than were the simple total 

•■ seorbs; !;■ ' 

...... . ' . ; ; ' . * .' \ . : . . : ; ■ 

At thUs point it seems appropriate to address two .concerns arising 

'from the, nature of these two ^iovqI testing procedures^ Since Table 1 

/shows a elear relationship between j^em ; length and difficulty for |he. two 

scales, it ^may seem that longer items were in general more ci j f f ic u i t ^ m ply 
• .beGauS"ev':they -. p reS6 n|ed ■ m o re jq p p o r t u n i t i ^s to ' e r r than Jt he ^s|TOrter. , jtenYs> 

Also, since the items gradually increased in length ' throughout each test 

culminatthg iry a segment of 19 words, these,- tests may in some respects 

■i ' . „.' • . . : .• ■ '_• ° : ; 

appear mor,e like tests of short-term memory than of language proficiency. 

• ■ j _ _ ' . _ . . _ _ © _ ■ ' . 

While both of .these concerns have some validity ,. it is nevertheless the* 

case that the longer items 'did very well in discriminating* Group NS from 

* * . • 

all of the ESL students. For example, on the dictation test alt 17 Group 
NS students passed the last item while only one ESL student from Group 
A D V did so white on the copytest 15 out of 17 Group NS students passed 
, the last; item while only 10 ouy of 50 ESL students did^so--{8 fron^ Croup 
ADV, 2. fr^m Group INT, and 0 from Group BEG). Therefore, unless 
thgre is some reason to beHeve that native English-speaking American 
students have uniformly betted short-term memories than foreign Students,* 
it appears more reasonable to conclude ' that it is the different levels of 
language knowledge represented in the sample that is -responsible for the 
variation. In test scores (see Table 3). It is this knowledge which^is 
necessary for the comprehension and "chunking" of the'wofds in each item 

which permits " their retention in short-term memory (see" Miller, 1956). 

>*■ . * -..-.*....- 

Also, while it cannot be denied that longer Items present more opportune 
ities for- error, this is also: likely the case for most mental tests where 

■ ■ _ I V 3 ...•.**•"" ■ 

more difficult items- (e.g., reading test items requiring the integration of 



■ < : -f • - Scales of Language 

many pieces of in format ton as well as iriferericirig skills; mathematics prob- 
lems requiring many ebmputatibrial step^s) generaHy, present many pore 
opportunities' foP error than easier items:- Thus it ;cpuid;^ie argued (as the 
■ preceding validity anal^je£ suggest) that t^ rHore^aiffif v 
. cult for the '"right! 1 reasbri^ri that they require exactly mental 
f^^^^ W^^^ jc l> ; is made possible opli by kno inglisH iih-^ 

: ■;>: gdag^ in all forms, of c o m pre li end i ncj arid 

producing language. / < •;. 

In summary, the group mean d i fife r e n ce hd iritercbrrelatibris re- 
ported in 'thi? section suggest that the dictation and copytest are valid 
measures of language proficiency and that the Rasch log ability transforma- 
tions of the totals dictation and copytest scores have slightly higher valid- 
ity as measures of language proficiency than do the 'raw total scopes of 
these two tests. . . '; \ 

Re-ana lysis of Cziko (1982) Bata \ ~± 

; • ■ — ~" • - * * - 

Since the results reported above are based on relatively small num- 

*bers of students and since the dictation arid copytest Used the same text 

-. * * ■ 

segmented iri identical ways, the dictation and copytest data reported by 
^Cziko (1982) were re-analyzed in. an attempt to replicatp these findings 
using the same methods;' of scale analysis. . These v, data w^re collected using 
a dictation arid copytest based on a different text than the brie Used above 
'1 consisting of 14 items ranging iri length from 2 to 21 words: A total of 
102 students were administered the dictation and *a smaller ^group of 34 
students took the copytest; As above, these students represented begin- 
ning to native-speaker proficericy iri English. Excluding all students with 
either perfect or zero test scores from the analyses left 87 arid 33 students 
" for the dictation and copytest, respectively. ' e. 



Scales of , Language 
20 



■ ■re- 



insert Tables 5 and 6 £bout here 



The results of the analyses of ^hese dictation and copytest items and " 
" scales are presented in Tables 5^and 6. 'Ag .can be seen on Table 5, all 14 
• items of . tf\e dictation and all but the second. jtem of ;the copytest appeared , , 
; to:, liav^ - acciptable indices of; Ra§ch ; fit and d iscrimmatidil as^|l( as , 

•' "Z' Y : " "T' ^■■-■■'-■■ vv • " v : " " ' __ _' V 

expected Rasch fit were ,21 and .19 for the dictation and copytest, re- 
spectively) . While the corrected item-total point-biserial correlations '(r v) 
were lower for extreme items/ the other indices indicate that except for 
the .second copytest item, these extreme items were nonetheless homogene- 
ous within their respective scales. As shown „on Table 6, both tests had 
estimates of psychometric reliability ranging from .85 to .93 with high H ^ 
values of .7,6 and .61 for the dictation and copytest, respectively, indicat- 
ing that both tests formed strong cumulative, homogeneous scales. With 
respect to the construct validity of the dictation test, Czil^o (1981) re- 



ported that group mean differences and correlations with other measures of 
^language proficiency (ranging from *75 to .86) supported its validity as 
measure of language proficiency. 



Discussion O 
The results of this research have provided evidence that relatively 
short scales can be constructed to provide Useful measures of language 
proficiency. It appears that such scales can be easily constructed by 
manipulating the length of segments of coherent text, -presenting the- 
segments either auditorily (as for the dictation) or visually [as 'for the. 



\4 



o 21 

ERIC -4 . 



Seales df Language 

, ; .'. ... : : : ,■ ; .. " v 2i . . • : : ; 



copytestj : , and requrrjng^tjie examinees^ to write down ^wliat they recall 
after the presentation of each, segment. ; 

The use of homogeneous, cumulative scales to measure" language 
.proficiency has a number of both theoretical and practical advantages over 
most other language testing techniques. First, the homogeneity §hd; cutout 
iativeness of a .set df such items can be considered ;evicj^n<^.:bf its ,u nidi - 
mensional ity , a qual ity which is important for al I measures. of' ability and 
yfet ^ abeert- found to be quite difficult to reliably: assess ? using even 
sophisticated factor analytic and latent trait procedures {sjee ; Hambleton, 

1983) . . ' . . f 'yJ ; ;: "V;.;- \ — ! , : ,' : / 

Second, since the items of an ability test ban only be cumulative if. 

: ■_ •' - ' ■ >--7- v ": 4 ' ; ■ • *T 

the scale includes items from all along" the difficulty conti^iuqm from- very 

• v% ; ) . ■ "n ; *'. 

easy to very difficult, such a test can be used' for students representing *a 

_ — • " " . ■' . <5 .. V..,'., 

very broad range of language proficiency , ranging from ; very p$or v to 

native-speaker proficiency. This cumulativeness of the Items also assures 

•• , r 

--' s . ' „ . •/ ;•.'*./. s. .< . :'<•„ ' 

that- an individual's total score is a good predictor df responses, ip^each of 
"the individuaf items. This vmgkes test scores more directly comfAaratye and - 
meaningful since two^individuals obtaining same total test scqre ion a cumu- 
lative scale will have a ; similar pattern of respdnses to individual test 
items. Cumulativeness also makes it ^possible y to. examine* the response 
patterns of individuals for evidence of inattentiveness to the test or cheat- 
ing. Such behavior would be indicated by a" response pattern cha^gcter- 

- c- ; - ' ' ; : ■ . ■ . _ / _ * 

ized by the failing of easy items and the passing of more difficult it^ms 



[see Harnisch, 1983, for a detailed discussion of unusuah item response 
patterns, how they can be quantified, and their implications for festirig 
and instruction) . 



■ 9 



~V v ? v f V' a * v - Scales of Language , . 

* '* 1 Msp< these/ scales » are amenable tp . Raich analysis since guessing" is ' * 
i not a factor . ^ few . exceptions items were found to, have con^ 

< • - v_ ... . .^V" 1 ^ ... \ - . . • - ; . . r ■ ^ ? - 

^ ;/^sisteH;fel"y: high, discN^ The log ability scores provided b^;^;;. r 

;:: " Rasch analysis, correfkted in general rr>0re highjy v&ltfi other measurfes of 

- II' . ■■ . i - - *• - ^ — - • .. o. \"- fV| - ' :\ • v , .. _ * '« •• - ■ -V.' .'.^w*^ 

" \j slanguage proficiency *tthan >did : the ' raw dictation $md copy test scores > , ' V 
Since -the estimatipn bf t ahly one par^rhet^r for -each item i s l;es s y errjahd ihg^ r ^ . ... 
■ in ierrns - o:fV the . ? riumber of examinees required .vfrliaii'.- tifeiSjr : "--Shcl ^t*K-refer- V5: ^ 

parameter iteM ; response models V the Rasch method as. employed here ."tap, :-v' : -- : y v - 



. f be Used 1 ri ;^gtt j ng^' , whe re relativ 4 ejy^ small numbers of ^ud^nts would #a;li;e^ . ^ - \ 

: ; ■ : : - ■ • . n - : v^-V;. •••• _ • ... s •. - --r^-- ■ .v-,v\ ^ ; - - ... - . - ^ -■■■.>?:■• ^-.r-,,^ 

i- ~v two- arid three-'pjarameter approaches inappropriate/ > a * . . : ■ -%\\ 

- ; - While, the thj;ee apprpacRes v usdd to ^)al.yte 4 thi ;\;'<^va2fi^ ~. 

' these* {Scales tended * to Converge j'n sigpaHing *'tne Vsame items; a^^ suspect^/ v " 

. the corrected poi&t-bi serial item-idta^ cdrrata^ 

^ V: cen'trajity pf v test items (always giving ' lower- coefficients *to lyery easy or 

r,- v * veiry hard, items] while the . Rasicjj) and ^Mqkken indicators ; werfe hoi '* s.d ^ /* 

; irifiuenced* Since :p6mt-bi^^nal \ correlations, are ^ so influenced by ^ii^ - T I 3 

-'■ •* .. " ■•>'■-' '*'"' '■ ->?'■ * . i*- . ; ' : "** j"' ' V : ->" . «■ y ' '•'■>'''-'*;-'$' ' 

t . v_ _ — _. - 4;- *v -- - i ,•- S« ■ -' - - : :v- •' "■- - - »-,.■ ■ .. - ■ • • ■■ *S, 

? , difficHjltv the Rascbf and'/or Mbkkert indices as u>ed ^ here appear \ mo re 

*■ . •• •• .^K* V • v t Vj--- : -.■ " ■ >•■•->- ■: ^, v'^y v " 

appropriate for analyzing items -included .-in a'^ scale oH sterns with ^wiGfclyi ^ ^ 

rangtng difficulties, ,.- v v: ^ : A - ^ ^ >^r' - , 'i 4 -. 

. -r;': ■ "i - / ■• ^ *y'''**X*>A , ^ ?= f*;' '••'"'r' 

• ; 4 - Am6ng t the practical advantages of thg e larLguagfe^tes^g procedures-, . 

;;-•*•■' * > . \ : - L - & V'i*:* ' ' *' T y ".-;./:' .... " " " "v - 

i n v e s j: i^ ^t e ^ ; i n 1 h i s research are- the ea^e an.d; speed with wftichfthese tests ... 
; ■ can be scored in , comparison sto traditionaf, scoring V^hbd$ which require ^ V . ^ 

"Attention to each individual ^word^of tiiie test passage. This segmehi /scor^*- ' ? 

*i rig procedure is .significantly fastfef than • the usual dictation storing pr*o- ■'"..•>'. 

. f cfedure Which trea't£ each word,; of the test passage^as ; separate item. ,;>?. This . -X. «v ■ 
•/• featuw, along ^ ,wi:th» *. the high reliability an^ , validity ,^f the pfdcedures ' ; 

. •_- - • •> v.. -■ r ■■ " -- vv .-- -- V *V -■-.-v.>-V"'-. ■:• 

: -.studied in. this re'search^ res^t^.vih a^d (2c i s io h;\to- *e pJ ^ce the traditionally ^ . : . 

°'-v ■ -. ' : ' i ' ..' ^ Vv'^/" ^\;M'->; J:;-:,"// ' "" '^^C^f 



ERIC, 



•v 



' ^ ;.• ■■ ■ .it ^ ■■■■■■ *? ■ -■ " 4* •*■■*: 



^ / %^>grfed vdictBtiort test of the, IEPT with the dictation test used in Cziko's 



;. v .,.w; . FinaUy , ^.ihce -these la hg liacj a v testi ng measures are^ based.: oa_cohererit 



text, they -are reiatjy^l^ also allow thre flexibility 

SfSft^MTtJ • either VwiHtten texts or dialogues , Xctepehd i hg oh the type of Ian- 



^ v J ; ;^^9^v^^^^^ \to^ festi While such: teM reconst ruction tasks have in '■■ 

: :'/. ; t" 'rZ^tfeP' past employed p r i m&r i\y^ wr itte n expository and narrative texts/ there 

r . yv '>l-,,^v->'T." ; i: ; ^ • ft ■ ... ... • : 

^ . ;^ • 4^ V • 3 P re fit reason why texts, based on naturally occurring oral lan- 
i . : gOag^^aulduhbt be used as well. ./Thus;, it appears possible to use such 
- i; ■^ , «c:/t^J^tqy& : ,- to teVt proficierfcy in a wide'': variety of styles, dialects; and 
: ;;#eg.i.?ter.s. of "the target language. • r ' ; t : 



- Even though bothsof the language testing techniques investigated in 
k this xesearc^ . involved writing as the response , there is hd reason why 
'oral production could not be used as the response mode should i t ( be 
desjrable not to involve writing. While' oral language production in re- 
jfv^V'^bhsfe^-tp auditorily presented language has been used in tests of elicited 

S Naim^n, 1 974) , the authors are not 
,* K faimiliar' with tests of second leanguage proficiency, that have Used oral 
' reading tasj^js. Research is needed, to determine- whether such language 
testing procedures, modified according to the techniques used in this* 
study, would have the^same desirable •characteristics as the dictation and . 



..*>■ 



copytest procedures investigated here. ! lf, this^'is the case, we v^ould then 

-V '* 1 _ . ' _ ■ - - <j .,' . 

.■ ___ __ _ _ j _ 1 _ •_ . ___ 

have four ^powerful and practical language testing procedures for measur- 

_ ■ : - — - — - _■ '_ i' . . . 

* ing. language proficiency which; involve either the auditory or visual pre- 

...... _ i&r l ' : 

J? •'*.-' • : « • I ■ * -.^ 

sensation of language and -either writing or oral production as response 
modes. • 



er|c : • 24 



« > -•• * - : ; ; • . Scales of Language 

The present research has demonstrated that relatively short, cUrriula- 
: -tive scales can provide reliable arid &aUd measures of language proficiency; 

'■' : ••- •- . ) ■ : ..." ' ;_;..„' ., . • ' ; ; : 

< r I t is hoped that this finding will encourage yne^surernent* specialists to 
investigate ways in which such scales can bp used to measure other cogni- 
tiye variables- and to rl0 longer coasiderl. these measurement techniques'. ' . - 
\y appropriate only for the measurement of affective variables in , the way that 
.Guttman scales have been primarily |i'sed.. 



ERLC 



1 * 



9 

ERIC 



. .References 

; "'" * , ■ V" *.'■•■■ * ■' 

■ . r ' 1 ' . . ■ ' ' ' ■ + . 

... Clark, H.. H. , S Clark, E. V. (1977.),' Psychology and 
York: Harcourt Brace ! JovSibvich; 
Gzikd, G. A. (1981); Psychomeitric and edumetrjc* appro? 
• > y testing : Impl icatiofis arid b jp» plica t ron g ■• A ^^>i Lthqji 
■ ; . Improving' the psych 

i ] . 4 and^jpraetlcai iXije^-.- - pf-: : Vi,rtteg i^.tii^" - l3.nguii'g^"Stfe§^ 
• terty; 16 (3) ;V 367-379; V ' ' ; . ;J^S:S 
. Cziko/ G ; A : * 5 yf"T 98ST>'--" • ^ A ; hV-. : 'im p r bv^ifie h f' ■ : ;b^e\h-'- -pjJttm^ s-.ge 
A computer program for evaluating cumulative, non| 
V of d ic hotbmous items; ; Educational a nd f?s y ch ol ogica \ ji 

\ ; ^ .i s^i^iC : r :: -v;- ^y^r-r'-j: C. V.^ . ^. * j'-^;; ;f yV ^-y- ; v : ■ ' 

Cuttman, L .; (1 947) . ~P h e . Co r n el I tech n i q u e fo r seal e . a n d 

Guttman/ U (1 950);. The basis for scalog ram analysis, 
; L. . . Guttman E. Suchman, P.~ Lazarsfeld, £• J\ 'j 
• - Measurement and P red ict tori. . Princeton • tf.- J r : Prin 
\ P Hkss ^ /• .. • ; s . ••- ■ • 

•H amble ton ,\'.R. K. .(1983, April) . ; 'A s sess i ng ; d i mens ionalj ty 
item$ . Paper presented at the: meeting of the ;Amer 
■'; :Resea rch A ssociation , • Mori treaL- ; .:, ■;.]■■': ■ V " V V; 
Harnisch; D. U" (1-983") Y > Item response patterns:' Applica 
; ;• tloriai . practice. Jpurhai of Educational Measurement. 
x Huir^ife.\vH J ; S N i e > N : H ( 1 Qlj^fe S PS$ Update 7^9 - v Nevv; 

,. Hill/ v :::: : ;' ;: ^ r -;.;; 1 - v "-. v - : ;. 

Ladb; R>. [1 951 ) ; : ^ tahguage ttesting ^' New Yq r k ; , M c G raW-H 



26 



ERIC 



Scales of Language 

LoevSng^r, J; (1947) . -A systematic approach to the construction of and : 
: - •." evaluation of vtests; of ability. Psychdlbgical AldHographs > r 61 . W) . 
Ldeviriger, (1548) . TKe technic df homogeneous tests ^compared with 
some aspects of '^scale analysis'' and -factor analysis. Psychological 

; Bulletin, 45, 507-530. 

\ . . v • . • • V. ' ■ . " :. ; .' ■■ V- ; ." ••. ■■ 

Lug ton , . R . (1 978) . American topics . Eriglewood Cliffs, N.J<V Prentice- 

, • ' . Hail. ■ ; ■ ; .' "/ : :'x 'i\ w • : 

Miller, 6. A. (1956). The magi&al number seven plus or minus one or 
- - two. Psychological Review, 63. , 81-97. ■ ;'■ \ : 

1 : Mokken, R. J. (1971) . A theory and procedure of scale analysis . The 
Hague: Mouton. ^ • 

Mokken, R. j. , S Lewis, C. (1982). A nonparametric approach > to the 
' / analysis" of dichotomous item responses. Applied Psychological Mea-^ 
. sU cement , 6, 417-430. , 
Neisser, U. (1 967). Cognitive psychology. New York: Appleton-Century- r 
Crofts. " ' . 

Oiler, J. W. , Jr. (1 972) . Dictation as a test of ESL proficiency. " In 
H. B. -'Allen & R. N. Campbell (Eds.), Teaching English as a second 
language , tfew York: McGraw-Hill. 
' Oiler, j. W. , Jr. (1979). Language tests at school . London: Longman. 
Oiler, J. W., Jr., S Streiff, V. (1975). Dictation: A test of grammar 
\ .based expectancies. In R. L. Jones & B. Spolsky (Eds.)-, Testing 
language proficiency . -Arlington, VA : Center for Applied Linguis- 
" tSci. 

Savignon, S. J. (1982). Dictation as a measure of communicative compe- 
tence in French as a second language: Language Learning , 32 , 
• 33-51 . . ' . ' ' ; , ' v • • ' - 



Scales of Language 

^ - '. 27 ^" 



Sjjbl|ky > p . t ( 1 97 8.}. . T ihtrbductidVi: Linguists arid language^ 
B* Spplsky (Ed.) ; Approaches Jtc 



In 



lage testing* Arlington, VA: 



• Center for Applied hihgufstics. - ,- 

5wain, ..M.,, Dumas, G., £ N aiman , ; N . ( 1 97*0 : . Alternatives, to sj^ntaneou? ';,;>. 



speech: Elicited translation and imitation as indicators ot second 
language competence. Working Papers to-Bllihguaili^ , 3, 76-90. 
'■' (ERIC Document Reproduction Service No. ; ED 1,23 872) ^ : 

Wright, B ; D . , S Mead, R. J . (1977). BICAL: Calibrating items and V 
scales with the Rasch model (Research Memorandum No. 23) . * Chi- 

; cago: University of Chicago, Department of Education. 



4 



ERLC 



28 



Scale s o f Languag e 

"'"-■}':■': * 28 ' V ■ ■ 



Author Notes 



Correspondence concerning this article should be sent to Gary A. 
Cziko, University of Illinois, Department of Educational Psychology; 1310 



S. Sixth St. , Champaign, 1 1 61820 



The current address of Nieh-Hsuan Jennifer Lin is 132 Rancho Drive 



#275, San Jose, CA 95111 . 4 




Scales of Language 

■ "29 ' 



Table; 1 , 

Characteristics 8 af ^Dictation and Cop Vtest items 



Sequence Length 



-ef- 



-DiffkH^lt y Fit Discrimination H 



Dictation (n=47) 



1 


2 


.83 


.19 


.83 


-3.05 


1.23 


.70 


' .33 


2 


3 


.45 


.38. 


.82 


• -.55 


1.21 


.93 


.34 


3 


3 


.91 


.13- 


.83 


-3.99 . 


1.35 


.70 


.31 




3 


."62 


.49 


.81 


-1.58 


.68 


1.06 


.53 


-V 5 


4 


.49 


.60 


.80 


-.^81 


.59 


1.48 


.54 


6 


6 


.38 


.56 


.80 


-.13 


.73 


1.05 


.50 


" 7 


5 


.23 


'-.50 


.81 


1.04 


.86 


1.16 , 


.51 


8 


:•: 7 


.43 


.59 


.80 


-.41- 


.63 


1.03 


.51 


g 


10 


.43 


.54 


.80 


-.41 


.81 


1.03 


"* .47 


• 10 


"11 


.09 


.45 


.81 


3.00 


.63 


1.04 


.64 


11 


15 ' 


.13 


.59- 


.80 


2.28 


.29 


1.06 


.69 


12 


17 


.15' 


.51 • 


.81 


1.99 


.58 


1.08 


' . 58 


13 


19 


.11 


.48 


.81 


2.62 


.55 


1.05 


.61 



Copytest (n=56) 



1 


2 


,98 


.04 


.85 


. -5.15'* 


1.75 


.99 


» — - 

• 14 


2 


3 


.91 


.25 


.85 


-3.31 


. .88 


1.04 


* * 53 A' \* 


3. 


3 


.95 


-.11 


.86 


-3.96 


53.16 


.02 


-.28 -. ' • 


4 


3 


.63 


: .60 


.83 


-.64 


.61 


1.01 


.62 ' 


5 


4 


.55 


.58 


.83 


-.11 


• .70 


■1 .03 


.54 


6 


6 


.54 


.80 


.81 


; 03 


.31. ;< 


1.42 


.71 • 


7 


5 


.52 


;50 


.84 


.16 


1.16 


.84 


, .46 


8 ' 


7 c 


.52 


.72 


.82 


.16 


.48 


1 .43 


.64, 


9 


10 


.41 


.58 


.83 


1.00 


1.05 


.97 


. 56 x 


10' 


11. 


.23 


:eo 


.83 


2.60 


1.15 


1.03 


•6? 


11 


' 15 


.18 


.62 


.83 


3.19 


.24 


1.13 


.80 


12 


17 


.11 


.41 


.84 


4.11 


.39 


1.10 


.68 


13 


19 


.38 


.49 


.84 


1.91 


1.13 


.98 


.52 



Note. Data from students obtaining zero or perfect scores were ex- 
cluded from these arialyes, ; i 

/ ■ * 

a __ _____ 

Definitions of tfese item characteristics are: sequence^ order of jtem 
in passage; length number of words in item; g - proportion of students 
passing item; r^-u = corrected item-total point-biserial correlation; a = in- 
ternal consistency of test with item deleted; fit - Rasch total mean-square 
fit; discrimination = Rasch item discrimination; H. = Mokken index of item 
homogeneity. . 



ERLC 



30 



Scales, of Language 

V - * 30 



Table 2 



ReUabiiity and Homogeneity of Dictation agd Copytest Scale 



Characteristic 



Cron bach's - a 

Spearman-Brown split- 
half reliability 

Guttman split-half • 
reliability 

Guttman largest A 

Lbevinger's H 



Dictation 



.86 

.86 
,86 
.50 



. Copytest 



.86 
.86. 

.58 




• in 



Note. The data of the same, students included in Table 1 were included , 

— — ~"~ * j 

. analyses. . \ " ' "'- . . '. . .; \ \ \ 



ERIC 



fable 3 . 

Dictation .fogd Copytest Results 



Scales of -Language 



Fstimateln 




Terence' 



■"; between means. (G =';:-955^ i 



Group- a 



M 



SD 



ES 3 



'Lower limit 



er 



Dictation 



















BEG 


13 


.1.00 


1.53 


















1 


.31 


"1.21 


*■ .3.63 


INT ' 


• 12 


3.42 


; 1.38 ; 












•25 >- 




* 


1. 


.33 


.86 


4.06 


ADV 


' 5, 88 


2.54 : 


















3. 


63 


' .5.41 


"'. '"■ 8.01 . 


NS 


17 

• 


12.59 


.87;. 






s 





BE6 


:? 0l3 '' 


2.54 


1.66 


T..22 - 


.97 


1 ; 95' 




INT 


12 


5.00 


- 1.95 


















1.23 




. ' <.19 ; 

. * ' * ■ 




ADV 


25 


7.48 


'2.58 


















2.31 ' 


3.-29' TP;; 


5.99 ; • 




NS 


17 


12.12 


1.11 


<* . . . i '* 






i a . 



Note : Data from "all students were ■ inci'uded- r ii^:th^ 

• * . * ' a ■ • . rv.. • .. . , .- 

/* a _ " . .' _ • ... ^ . . .. _ " _ ... •- ^ _ ' --<v 

ES for adjacent group means was calculated by subtracting tbe mean of 
the less proficient group from the mean of the more p rof ici eri t a g roup., and ";■ 
divi^irig ; this difference by the pooled standard deviation of test scores; fo-r 

all four, groups. i - , ^ * : ; . 

■ • * • ' • • 

These estimated limits are for adjacent means shown on rows immediately 4A 
above and below the ; ]pW A bh' which the limits are .given. 



ERLC 



32 



.•; •"• * • Scales of Language 

^ 32 ' 



Table 4 . • 
lnl|correiations Among Measures of Language Profieieney 









— •;■ T[ ' • ..; ;. : 


■ * •' 'V '■ ' 


. Listening tests . ,,. : ^;^ : ' v ''-'"' ; 


heading tests ' 




. . Te'St :' " : ; 


1 2 •:. 3 4' 5:6. 


"Kr. 8 9 




~1. • Dictation total 


— ^.^-4XM9(^M4(B)-.WM3(5&)- 







.2.:' Dictation log ability .98. •% .27(18) .82(18) .81(47) .79(46) .62(18) .77(18) .46(18) .59(^ .68(18)-^^ 

3, IEPT dictation .23 .27 i r;; . ..64(23) 

4. TOEFL listening : '':-7-: :,y -7.[:\\y\r'Z'^ ; - / ' 

;fl .82 .49 • .45(23) .32(22)' .90(23) .71(23) .69(23) .74(23) 



compreqension 



.40 • 


' .35 




;99( 


.38.'' ; 


,.35 1 


.99 


* * 


.65 


.84 


.52 • 


.57 


.34 

.58 


.'63 
.62 


.42- 


.63 
.47 


.50 . 


.74 


.65 


.70 


.71. 


.81 . 


.58 


.64 








.58 


.85 ,:, 


•.55 


;: .60 



5. i-Copytest total ' .30 .34 

6. r Gop.ytest.log ability .33' '.37 . 38. ,.35* .99 . 43(22) .57(22) .37(22) .67(22) .50(22) .52(22) 
' 7. ' IEPT structure ,,55 -.62 .65 " ; % .52 • .57 .... S ' .73(25) .84(23) .80(23) .97(25). .94(23*) 
: 8; IEPT. cloze ' .74 .77 .34 .1 ,.55 .63 .68 ; ' ' .65(23) .78(23) .85(25) .80(23) 
;.9.. TOEFL structure .40 .45- .58 .62 .42- .47 ' .81 .' '.66 ■ .73(23)'. .83(23) * .89(23) 
10. TOEFL; reading'' 



\',52 .59 \ .fO . .74 .65 .70 .87 '.74 : .# ,84(23) .93(23) 
11 . 4 Ei^MlN^;; : '563v#: . 81 . .58 .64 .96 .82 , ,.83';, .87. ' : ';,, ; - - .94(23) 

• . k\ .68 ' .58 i -.55 : .60 >M .76 .89 .96. .93. ... - ' 



'Note. Each coefficient above the main diagonal includes all- students with non-missing data for the two tests -{the number'of 
students f for each'' cbef f icient is g iyeri jn parentheses). Each coefficient below the diagonal includes the same 18 students from 
•Group BE^ and- jNT* for whom test scores on aft tests were IvaUble! AM correlation coefficients 'greater than .39 were signifi- 
cantly |reate>;,iffan 2p'(.g, <'.Q5). ^'^^^■^^■'^ . , ; : •■ 



it. © ' ' ' ' *■ 



•V*'v^; 



Scales of Language 
33 



Table 5 ' 

Re-Analysis of Characteristics of Dictation and Copytest items from Data 
Collected by Cziko (1982) ," 



Sequence Length g r^ a 'Difficulty Fit Discrimination Hjj 
, ; 'Dictattoh (n=87) 



1 


2 


.94 


.18 


.91 


-7.69 


. .49 ' 


1.13 - 


.68 


2 


,4 


.80 


.•35 


.91 


-5.70 


.61 


T.14 


.78 


3 ; 


. 4 


>. .68 


.51 


.90 


-4.25 


.34 - 


1.14 


.89 


: 4 


6 


.39 


.68 


.89 


-.99 


. .88 


1.03 


.79 


5 


5 


.33 


.75 


.89 


-.34 


.42 


1.03 • 


.81 


6 


8 


.25 


.52 


.90 


-.68 


1.36 


* .95 


.53 ■ 


- 7 


8 


.25 


.78 


.89 


.68 


.32 


1.03 


.78 


8 


7 - 


.22 


.77 


.89 


1.19 


.36 


1.03 


.77 


9 


19 


~29 


.76 


.89 


1.37 


. .38 


1.03 * 


: .76 


10 


19 


.81 


.89 


1.56 


.22 


1.03 


.81 


11 


13 


.18 


.72 


.89 


1.75 


.33 


1 .03 


.74 


12 


' 14 ' 


11 


60 


.90 


3.02 


.31 ' 


1 04 




13 


18 


.10 


.62 


.90 


3.25 


.17 


1.04 


.83 ^ . 


. 14 


21 ' 


.02 


.27 


.90 


- 5.47 


.19 


1.02, 


.76 










Copytest (n=33) 






» • - 

1 


. 2 


.94 


.43 


.84 


-4.62 


.13 


1.08 


.84 


2 


\ 4 ' 


.85 


.14 


.86 


-2.89 


3.73 


-.14 


.19 


3 


s 4 


.88 


. .62 


;.83 


-3.38 


.12 


1.08 


.87 


4 


I 1 


- .73 . 


...59 


.83 


-1.48 


.74 


1.00 


.62 


'5 




.76- 


.52 


.84 


-1.77 


.99 


.93 


.57 


6 


8 


. .39 


.57 


.83 


1.03 


.64 


."87 


.61 


7 


8 


.61 


.70 


'.82 


-.49 


.44 


1.33 


.69 " 


8 


7 


:67 


.68 


.83 


-.96 


.44 


1.30 


.68 ' 


9 


10 


: .52 


.72 


.82 : 


• .17 


.41 


1.45 


.71 


- 10 • 


10 


• .18 


.40 


.84 


2.76 


.63 


1.09 


.59 


11 


13 


.42 


.62 


.83 • 


.81 


.56 


1.20 


.64 


12 


14: 


.27 


.29 


.85 


1.94 


1.56 


.91 


.37 


13 


18 


.06 


'.33 


.85 


4.44 


•16 , 


1.04 


.74 


14 


21 


.06 


.17 


.85 


4.44 


.1.72 


^.04 


. .39 





ERLC 



Note. Data from Students obtaining zero or perfect:, scores were ^ex- 



cluded from these 
characteristics. 



analyses. See note of Table 1 fbr definitions of item 

/.■:,r-^, 



35 



fable 6 



Scales of Language 

34 , 



Re-Analysis of Reliability and Homogeneity of Dictation and Cdpytest Scales 
from Data Collected by Cziko (1982) 



Characteristic . Dictation - %-' Copy'test 

... . . — - ■ ■ • . . 



Croribach's a .90 r , ;85 

Spearman -Brown split-half . .92 .89 

reliability . \ . ' '•' 1 " " 

Guttrtjan split-half ; .92 ■ <• .88 
reliability 

Guttman's" largest h * v .93 .91 

boeviriger's U .76 . - .61 



Vv 



Note. The data of the same students included in Table 3 were included 

■ ■' ■ • .; i 

jn these analyses. % : 



'• •'■ ''*'-^%-<^ii' . ? ' ''i>fi Scales of Language 

" 5 ' ; '" : " ■ 35 

; Appendix A^ " ; 



Test Passage s 
• ■ 

. • ■ ' . c. . . . ■ ' . ' . . . . 

animal s -/ used to wander" /—over . our country / in uncounted 



numbers. / Today these animal populations / have decreased to "-a great 

' ■ • • _j . . • . . " - 

exte nts — Sdme-ammais— ha ve disappeared altogether, / des t royed by the~~ — - 
advance of human civilization. ./ The same story can be told in the *' ... 

. African continent/ / once cqv'lred With big garni such as elephant, buffalo, ; 

. : '_ .. f »• t* ..." ■ . • . • . 

and antelope. ./ In Central and South America, where animals were once 

• j- ■ ■ . . *y ■ .. '■ ■ ' 

thought safe, they are pow threatened; / In the last three centuries, 

o * :y t Jf - .-. , J . * 

over two hundred 'species bf mammals, birds, and reptiles ^ have become 
- .extinct. / Our wild aniqjals are being swept from the land, thf birds 
from the air, the fisri* from "the sea. • ■-. 



S 



Note- The boundaries of the 13 segments (items) of the test passage 
are indicated by vo ft Ugj lines. - ■ ; 




ERIC v 7 X 3 ? 



f 



9 

ERIC 



